Methods and systems for reducing blocking artifacts with reduced complexity for spatially-scalable video coding

ABSTRACT

A method for characterizing of a block boundary between neighboring blocks when at least one of said neighboring blocks is encoded using inter-layer texture prediction (I_BL) including characterizing the block boundary with a first boundary strength indicator when a luma sample from one of the neighboring blocks is encoded using an intra-prediction mode other than the I_BL characterizing the block boundary with a second boundary strength indicator when no luma sample from the neighboring blocks has intra-prediction mode encoding other than the I_BL, and any of the neighboring blocks and blocks from which the neighboring blocks are predicted have non-zero transform coefficients or characterizing the block boundary with a third boundary strength indicator when no luma sample from the neighboring blocks is encoded using an intra-prediction mode other than the I_BL and all of the neighboring blocks and blocks from which the neighboring blocks are predicted have no transform coefficients.

RELATED REFERENCES

This application is a Divisional of co-pending application Ser. No.11/350,181, filed on Feb. 7, 2006, which is a regular utilityapplication of U.S. Provisional Application No. 60/663,161; filed Mar.18, 2005, U.S. Provisional Application No. 60/683,060, filed May 20,2005; U.S. Provisional Application No. 60/686,676, filed Jun. 1, 2005;and is a continuation-in-part of U.S. patent application Ser. No.10/112,683, filed on Mar. 29, 2002, which is a continuation of U.S.patent application Ser. No. 09/817,701, filed on Mar. 26, 2001; which isa continuation-in-part of U.S. patent application Ser. No. 10/799,384,filed on Mar. 11, 2004, which is a continuation of PCT PatentApplication No. PCT/JP02/09306, filed on Sep. 11, 2002; which is acontinuation of U.S. patent application Ser. No. 09/953,329, filed onSep. 14, 2001, the entire contents of which are hereby incorporated byreference.

FIELD OF THE INVENTION

Embodiments of the present invention comprise methods and systems forimage block boundary filtering control. Some embodiments of the presentinvention comprise methods and systems for characterizing a blockboundary between neighboring blocks within a spatial scalabilityenhancement layer for controlling deblocking filter operations.

BACKGROUND

H.264/MPEG-4 AVC [Joint Video Team of ITU-T VCEG and ISO/IEC MPEG,“Advanced Video Coding (AVC)—4^(th) Edition,” ITU-T Rec. H.264 andISO/IEC 14496-10 (MPEG4-Part 10), January 2005], which is incorporatedby reference herein, is a video codec specification that uses macroblockprediction followed by residual coding to reduce temporal and spatialredundancy in a video sequence for compression efficiency. Spatialscalability refers to a functionality in which parts of a bitstream maybe removed while maintaining rate-distortion performance at anysupported spatial resolution. Single-layer H.264/MPEG-4 AVC does notsupport spatial scalability. Spatial scalability is supported by theScalable Video Coding (SVC) extension of H.264/MPEG-4 AVC.

The SVC extension of H.264/MPEG-4 AVC [Working Document 1.0 (WD-1.0)(MPEG Doc. N6901) for the Joint Scalable Video Model (JSVM)], which isincorporated by reference herein, is a layered video codec in which theredundancy between spatial layers is exploited by inter-layer predictionmechanisms. Three inter-layer prediction techniques are included intothe design of the SVC extension of H.264/MPEG-4 AVC: inter-layer motionprediction, inter-layer residual prediction, and inter-layer intratexture prediction.

Block based motion compensated video coding is used in many videocompression standards such as H.261, H.263, H264, MPEG-1, MPEG-2, andMPEG-4. The lossy compression process can create visual artifacts in thedecoded images, referred to as image artifacts. Blocking artifacts occuralong the block boundaries in an image and are caused by the coarsequantization of transform coefficients.

Image filtering techniques can be used to reduce artifacts inreconstructed images. Reconstructed images are the images produced afterbeing inverse transformed and decoded. The rule of thumb in thesetechniques is that image edges should be preserved while the rest of theimage is smoothed. Low pass filters are carefully chosen based on thecharacteristic of a particular pixel or set of pixels surrounding theimage edges.

Non-correlated image pixels that extend across image block boundariesare specifically filtered to reduce blocking artifacts. However, thisfiltering can introduce blurring artifacts into the image. If there arelittle or no blocking artifacts between adjacent blocks, then low passfiltering needlessly incorporates blurring into the image while at thesame time wasting processing resources.

Previously, only dyadic spatial scalability was addressed by SVC. Dyadicspatial scalability refers to configurations in which the ratio ofpicture dimensions between two successive spatial layers is a power of2. New tools that manage configurations in which the ratio of picturedimensions between successive spatial layers is not a power of 2 and inwhich the pictures of the higher level can contain regions that are notpresent in corresponding pictures of the lower level, referred to asnon-dyadic scaling with cropping window, have been proposed.

All of the inter-layer prediction methods comprise picture up-sampling.Picture up-sampling is the process of generating a higher resolutionimage from a lower resolution image. Some picture up-sampling processescomprise sample interpolation. The prior up-sampling process used in theSVC design was based on the quarter luma sample interpolation procedurespecified in H.264 for inter prediction. When applied to spatiallyscalable coding, the prior method has the following two drawbacks: theinterpolation resolution is limited to quarter samples, and thus, is notsupportive of non-dyadic scaling; and half-sample interpolation isrequired in order to get a quarter-sample position making this methodcomputationally cumbersome. A picture up-sampling process that overcomesthese limitations is desired.

SUMMARY

Embodiments of the present invention comprise methods and systems forimage encoding and decoding. Some embodiments of the present inventioncomprise methods and systems for characterization of a block boundarybetween neighboring blocks within a spatial scalability enhancementlayer. In some embodiments, at least one of the neighboring blocks isencoded using inter-layer texture prediction. A block boundary may becharacterized with a boundary strength indicator when one of saidneighboring blocks meets specified criteria.

The foregoing and other objectives, features, and advantages of theinvention will be more readily understood upon consideration of thefollowing detailed description of the invention taken in conjunctionwith the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram showing how deblock filtering is selectively skippedaccording to similarities between adjacent image blocks.

FIG. 2 is a diagram showing two adjacent image blocks having similarmotion vectors.

FIG. 3 is a diagram showing how transform coefficients are identifiedfor one of the image blocks.

FIG. 4 is a diagram showing how residual transform coefficients arecompared between two adjacent image blocks.

FIG. 5 is a block diagram showing how the video image is encoded anddecoded.

FIG. 6 is a block diagram showing how deblock filtering is selectivelyskipped in a codec.

FIG. 7 is a representation of an existing block based image filteringtechnique.

FIG. 8 is a block diagram showing a technique for determining theboundaries to filter and the strength of the respective filter to use.

FIG. 9 is a drawing to explain other embodiments of the presentinvention

FIG. 10 is a drawing to explain further embodiments of the presentinvention.

FIG. 11 is a drawing to explain further embodiments of the presentinvention.

FIG. 12 is a drawing to explain further embodiments of the presentinvention.

FIG. 13 is a flow chart describing the steps of an embodiment of thepresent invention in which deblock filtering between adjacent blocks isdependent on similarity of coding parameters in adjacent blocks.

FIG. 14 is a flow chart describing the steps of an embodiment of thepresent invention in which deblock filtering between adjacent blocks isdependent on adjacent blocks having similar motion vectors.

FIG. 15 is a flow chart describing the steps of an embodiment of thepresent invention in which deblock filtering between adjacent blocks isdependent on adjacent blocks having similar motion vectors that point tothe same reference frame.

FIG. 16 is a flow chart describing the steps of an embodiment of thepresent invention in which deblock filtering between adjacent blocks isdependent on adjacent blocks having similar motion vectors that point toadjacent reference blocks in a single reference frame.

FIG. 17 is a flow chart describing the steps of an embodiment of thepresent invention in which deblock filtering between adjacent blocks isdependent on adjacent blocks having parameters comprising similar D.C.transform coefficients.

FIG. 18 is a flow chart describing the steps of an embodiment of thepresent invention in which deblock filtering between adjacent blocks isdependent on adjacent blocks having parameters comprising similar A.C.transform coefficients.

FIG. 19 is a flow chart describing the steps of an embodiment of thepresent invention in which deblock filtering between adjacent blocks isdependent on adjacent blocks in a luminance image having parameterscomprising similar motion vectors and similar motion vector targets in areference frame.

FIG. 20 is a flow chart describing the steps of an embodiment of thepresent invention in which deblock filtering between adjacent blocks isdependent on adjacent blocks in a luminance image having parameterscomprising similar motion vectors, similar motion vector targets in areference frame and similar transform coefficients.

FIG. 21 is a flow chart describing the steps of an embodiment of thepresent invention in which an image is split into separate luminance andchrominance channels and deblock filtering between adjacent blocks ineach luminance or chrominance image is dependent on adjacent blocks in aluminance image having parameters comprising similar motion vectors.

FIG. 22 is a flow chart describing the steps of an embodiment of thepresent invention in which an image is split into separate luminance andchrominance channels and deblock filtering between adjacent blocks ineach luminance or chrominance image is dependent on adjacent blocks in aluminance image having parameters comprising similar motion vectors,similar motion vector targets in a reference frame and similar transformcoefficients.

FIG. 23 is a diagram showing the geometric relationship between a basespatial layer and an enhancement spatial layer in some embodiments ofthe present invention;

FIG. 24 is a diagram showing the geometric relationship between anupsampled base layer picture and an enhancement layer picture of someembodiments of the present invention;

FIG. 25 is a diagram showing pixels of a 4×4 block;

FIG. 26 is a diagram showing 4×4 blocks within an 8×8 block;

FIG. 27 is a diagram showing 8×8 blocks of a prediction macroblock;

FIG. 28 is a flow chart showing an exemplary method for characterizingblock boundaries based on neighboring block attributes;

FIG. 29 is a flow chart showing an alternative exemplary method forcharacterizing block boundaries based on neighboring block attributes;and

FIG. 30 is a flow chart showing another alternative exemplary method forcharacterizing block boundaries based on neighboring block attributes.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

Embodiments of the present invention will be best understood byreference to the drawings, wherein like parts are designated by likenumerals throughout. The figures listed above are expressly incorporatedas part of this detailed description.

It will be readily understood that the components of the presentinvention, as generally described and illustrated in the figures herein,could be arranged and designed in a wide variety of differentconfigurations. Thus, the following more detailed description of theembodiments of the methods and systems of the present invention is notintended to limit the scope of the invention but it is merelyrepresentative of the presently preferred embodiments of the invention.

Elements of embodiments of the present invention may be embodied inhardware, firmware and/or software. While exemplary embodiments revealedherein may only describe one of these forms, it is to be understood thatone skilled in the art would be able to effectuate these elements in anyof these forms while resting within the scope of the present invention.

Conventional filtering processes consider a single reconstructed imageframe at a time. Block based video encoding techniques may use motionvectors to estimate the movement of blocks of pixels. The motion-vectorinformation is available at both the encoder and decoder but is not usedwith conventional filtering processes. For example, if two adjacentblocks share the same motion vector with respect to the same referenceimage frame, (for a multiple reference frames system) there is likely nosignificant difference between the image residuals of each block andaccordingly should not be filtered. In essence, adjacent portions of theimage have the same motion with respect to the same reference frame andaccordingly no significant difference between the image residuals wouldbe expected. In many cases, the block boundary of these two adjacentblocks may have been filtered in the reference frame and shouldtherefore not be filtered again for the current frame. If a deblockfilter is used without considering this motion-vector information, theconventional filtering process might filter the same boundary again andagain from frame to frame. This unnecessary filtering not only causesunnecessary blurring but also results in additional filter computations.

FIG. 1 illustrates an image 12 that selectively filters blockingartifacts according to similarities between image blocks. It is to beunderstood that the image may likewise use non-square blocks or anyother sets of pixels. The boarders between some of the blocks 14 includeblocking artifacts 18. In general blocking artifacts are any imagediscontinuities between blocks 14 that may result from the encodingand/or decoding process. A low pass filter or other filter may be usedto reduce the blocking artifacts that exist at the boarders of adjacentimage blocks.

For example, blocking artifacts 24 exist between blocks 20 and 22. A lowpass filter may be used at the boarder 26 between blocks 20 and 22 toremove or otherwise reduce the blocking artifacts 24. The low passfilter, for example, selects a group of pixels 28 from both sides of theboarder 26. An average pixel value, or any other statistical measure, isderived from the group of pixels 28. Then each individual pixel iscompared to the average pixel value. Any pixels in group 28 outside of apredetermined range of the average pixel value is then replaced with theaverage pixel value.

As previously described, if there are few or no blocking artifacts 24between the adjacent pixels, then the groups of pixels 28 may beneedlessly filtered causing blurring in the image. A skip mode filteringscheme may use the motion estimation and/or compensation information foradjacent image blocks as a basis upon which to selectively filter. Ifthe motion estimation and compensation information is sufficientlysimilar the filtering may be skipped. This avoids unnecessary imageblurring and significantly reduces the required number of filteringoperations, or any other appropriate value.

As an example, it may be determined during the encoding process thatadjacent image blocks 30 and 32 have similar coding parameters.Accordingly, the deblock filtering may be skipped for the groups ofpixels 34 that extend across the boarder 31 between adjacent blocks 30and 32. Skip mode filtering can be used for any horizontal, vertical, orotherwise any boundary between adjacent blocks in the image 12.

FIG. 2 illustrates a reference frame 42, reference frame 48, and acurrent frame 40 that is currently being encoded or decoded. The codingparameters for blocks 44 and 46 are compared to determine whether thedeblock filtering should be skipped between the two adjacent blocks 44and 46. One of the encoding parameters that may be compared is themotion vectors (MV) for the blocks 44 and 46.

A motion vector MV1 points from block 44 in the current image frame 40to an associated block 44′ in the reference image 42. A motion vectorMV2 points from block 46 in the current image frame 40 to an associatedblock 46′ in the reference frame 42. A skip mode filtering checks to seeif the motion vectors MV1 and MV2 point to adjacent blocks in the samereference frame 42. If the motion vectors point to adjacent blocks inthe same reference frame (MV1=MV2), then the deblock filtering may beskipped. This motion vector information may be used along with othercoding information to decide whether to skip deblock filtering betweenthe two image blocks 44 and 46.

More than one reference frame may be used during the encoding anddecoding process. For example, there may be another reference frame 48.The adjacent blocks 44 and 46 may have motion vectors pointing todifferent reference frames. In one example, the decision to skip deblockfiltering depends on whether the motion vectors for the two adjacentblocks point to the same reference frame. For example, image block 44may have a motion vector 49 pointing to reference frame 48 and imageblock 46 may have the motion vector MV2 pointing to reference frame 42.The deblock filtering is not skipped in this example because the motionvectors 49 and MV2 point to different reference frames.

FIG. 3 illustrates another example of a coding parameter that may beused to decide whether or not to selectively skip deblock filtering. Theimage block 44 from image frame 40 is compared with reference block 44′from the reference frame 42 pointed to by the motion vector MV1 aspreviously illustrated in FIG. 2. A residual block 44″ is output fromthe comparison between image block 44 and reference block 44′. Atransform 50 is performed on the residual block 44″ creating atransformed block 44″ of transform coefficients. In one example, thetransform 50 is a Discrete Cosine Transform. The transformed block 44″includes a D.C. components 52 and A.C. components 53.

The D.C. component 52 refers to a lowest frequency transform coefficientin image block 44. For example, the coefficient that represents theaverage energy in the image block 44. The A.C. components 53 refer tothe transform coefficients that represent the higher frequencycomponents in the image block 44. For example, the transformcoefficients that represent the large energy differences between pixelsin the image block 44.

FIG. 4 illustrates the transformed residual blocks 44″ and 46″. The D.C.components 52 from the two transformed blocks 44″ and 46″ are comparedin processor 54. If the D.C. components are the same or within somerange of each other, the processor 54 notifies a deblock filteroperation 56 to skip deblock filtering between the boarder of the twoadjacent blocks 44 and 46. If the D.C. components 52 are not similar,then no skip notification is initiated and the boarder between blocks 44and 46 is deblock filtered.

In one example, the skip mode filtering may be incorporated into theTelecommunications Sector of the International Telecommunication Union(ITU-T) proposed H.26L encoding scheme. The H.26L scheme uses 4×4integer Discrete Cosine Transform (DCT) blocks. If desired, only theD.C. component of the two adjacent blocks may be checked. However somelimited low frequency A.C. coefficients may likewise be checked,especially when the image blocks are larger sizes, such as 9'9 or 16×16blocks. For example, the upper D.C. component 52 and the three lowerfrequency A.C. transform coefficients 53 for block 44″ maybe comparedwith the upper D.C. component 52 and three lower frequency A.C.transform coefficients 53 for block 46″. Different combinations of D.C.and/or any of the A.C. transform coefficients can be used to identifythe relative similarity between the two adjacent blocks 44 and 46.

The processor 54 can also receive other coding parameters 55 that aregenerated during the coding process. These coding parameters include themotion vectors and reference frame information for the adjacent blocks44 and 46 as previously described. The processor 54 may use some or allof these coding parameters to determine whether or not to skip deblockfiltering between adjacent image blocks 44 and 46. Other encoding andtransform functions performed on the image may be carried out in thesame processor 54 or in a different processing circuit. In the casewhere all or most of the coding is done in the same processor, the skipmode is simply enabled by setting a skip parameter in the filteringroutine.

FIG. 5 shows how skip mode filtering may be used in a block-basedmotion-compensated Coder-Decoder (Codec) 60. The codec 60 is used forinter-frame coding. An input video block from the current frame is fedfrom box 62 into a comparator 64. The output of a frame buffering box 80generates a reference block 81 according to the estimated motion vector(and possible reference frame number). The difference between the inputvideo block and the reference block 81 is transformed in box 66 and thenquantized in box 68. The quantized transform block is encoded by aVariable Length Coder (VLC) in box 70 and then transmitted, stored, etc.

The encoding section of the codec 60 reconstructs the transformed andquantized image by first Inverse Quantizing (IQ) the transformed imagein box 72. The inverse quantized image is then inverse transformed inbox 74 to generate a reconstructed residual image. This reconstructedresidual block is then added in box 76 to the reference block 81 togenerate a reconstructed image block. Generally the reconstructed imageis loop filtered in box 78 to reduce blocking artifacts caused by thequantization and transform process. The filtered image is then bufferedin box 80 to form reference frames. The frame buffering in box 80 usesthe reconstructed reference frames for motion estimation andcompensation. The reference block 81 is compared to the input videoblock in comparator 64. An encoded image is output at node 71 from theencoding section and is then either stored or transmitted.

In a decoder portion of the codec 60, a variable length decoder (VLD)decodes the encoded image in box 82. The decoded image is inversequantized in box 84 and inverse transformed in box 86. The reconstructedresidual image from box 86 is added in the summing box 88 to thereference block 91 before being loop filtered in box 90 to reduceblocking artifacts and buffered in box 92 as reference frames. Thereference block 91 is generated from box 92 according to the receivedmotion vector information. The loop filtered output from box 90 canoptionally be post filtered in box 94 to further reduce image artifactsbefore being displayed as, a video image in box 96. The skip modefiltering scheme can be performed in any combination of the filteringfunctions in boxes 78, 90 and 94.

The motion estimation and compensation information available duringvideo coding are used to determine when to skip deblock filtering inboxes 78, 90 and/or 94. Since these coding parameters are alreadygenerated during the encoding and decoding process, there are noadditional coding parameters that have to be generated or transmittedspecially for skip mode filtering.

FIG. 6 shows is further detail how skip mode filtering may be used inthe filters 78, 90, and/or 94 in the encoder and decoder in FIG. 5. Theinterblock boundary between any two adjacent blocks “i” and “k” is firstidentified in box 100. The two blocks may be horizontally or verticallyadjacent in the image frame. Decision box 102 compares the motion vectormv(j) for block j with the motion vector mv(k) for block k. It is firstdetermined whether the two adjacent blocks j and k have the same motionvector pointing to the same reference frame. In other words, the motionvectors for the adjacent blocks point to adjacent blocks (mv(j)=mv(k))in the same reference frame (ref(j)=ref(k)).

It is then determined whether the residual coefficients for the twoadjacent blocks are similar. If there is no significant differencebetween the image residuals of the adjacent blocks, for example, the twoblocks j and k have the same of similar D.C. component (dc(j) dc(k)),then the deblock filtering process in box 104 is skipped. Skip modefiltering then moves to the next interblock boundary in box 106 andconducts the next comparison in decision box 102. Skip mode filteringcan be performed for both horizontally adjacent blocks and verticallyadjacent blocks.

In one embodiment, only the reference frame and motion vectorinformation for the adjacent image blocks are used to determine blockskipping. In another embodiment, only the D.C. and/or A.C. residualcoefficients are used to determine block skipping. In anotherembodiment, the motion vector, reference frame and residual coefficientsare all used to determine block skipping.

The skip mode filtering scheme can be applied to spatially subsampledchrominance channels. For example in a case with 4:2:0 color formatsequences, skip mode filtering for block boundaries may only rely on theequality of motion vectors and D.C. components for the luminancecomponent of the image. If the motion vectors and the D.C. componentsare the same, deblock filtering is skipped for both the luminance andchrominance components of the adjacent image blocks. In anotherembodiment, the motion vectors and the D.C. components are consideredseparately for each luminance and chrominance component of the adjacentblocks. In this case, a luminance or chrominance component for adjacentblocks may be deblock filtered while the other luminance or chrominancecomponents for the same adjacent blocks are not deblock filtered.

Referring to FIG. 7, some known techniques define a “block strength”parameter for the loop filter to control the loop filtering process.Each block of an image has a strength value that is associated with theblock and controls the filtering performed on all of its four blockboundaries. The block strength value is derived based on the motionvectors and the transform coefficients available in the bitstream.However, after consideration of the use of the block strength value forall four edges of the block, the present inventors came to therealization that this results in removing some blocking artifacts atsome edges while unnecessarily blurring along other edges.

In contrast to the block by block manner of filtering, the presentinventors came to the realization that filtering determinations shouldbe made in an edge by edge manner together with other information. Theother information, may include for example, intra-block encoding ofblocks, motion estimation of blocks with residual information, motionestimation of blocks without residual information, and motion estimationof blocks without residuals having sufficient differences. One, two,three, or four of these information characteristics may be used toimproved filtering abilities in an edge by edge manner. Based upondifferent sets of characteristics, the filtering may be modified, asdesired.

For each block boundary a control parameter is preferably defined,namely, a boundary strength Bs. Referring to FIG. 8 a pair of blockssharing a common boundary are referred to as j and k. A first block 110checks to see if either one of the two blocks is intra-coded. If eitheris intra-coded, then the boundary strength is set to three at block 112.Block 110 determines if both of the blocks are not motion predicted. Ifno motion prediction is used, then the block derives from the frameitself and accordingly there should be filtering performed on theboundary. This is normally appropriate because intra-coded blockboundaries normally include blocking artifacts.

If both of the blocks j and k are, at least in part, predicted from aprevious or future frame, then the blocks j and k are checked at block114 to determine if any coefficients are coded. The coefficients, may befor example, discrete cosine transform coefficients. If either of theblocks j and k include non-zero coefficients, then at least one of theblocks represent a prediction from a previous or future frame togetherwith modifications to the block using the coefficients, generallyreferred to as residuals. If either of the blocks j and k includenon-zero coefficients (and motion predicted) then the boundary strengthis set to two at block 116. This represents an occurrence where theimages are predicted but the prediction is corrected using a residual.Accordingly, the images are likely to include blocking artifacts.

If both of the blocks j and k are motion predicted and do not includenon-zero coefficients, generally referred to as residuals, then adetermination at block 118 is made to check if the pixels on either sideof the boundary are sufficiently different from one another. This maylikewise be used to determine if the residuals are sufficiently small.If a sufficient difference exists then a blocking artifact is likely toexist. Initially a determination is made to determine if the two blocksuse different reference frames, namely, R(j)≠R(k). If the blocks j and kare from two different reference frames then the boundary strength isassigned a value of one at block 120. Alternatively, if the absolutedifference of the motion vectors of the two image blocks is checked todetermine if they are greater than or equal to 1 pixel in eithervertical or horizontal directions, namely, |V(j,x)−V(k,x)|≧1 pixel or|V(j,y)−V(k,y)|≧1 pixel. Other threshold values may likewise be used, asdesired, including less than or greater than depending on the test used.If the absolute difference of the motion vectors is greater than orequal to one then the boundary strength is assigned a value of one.

If the two blocks j and k are motion predicted, without residuals, arebased upon the same frame, and have insignificant differences, then theboundary strength value is assigned a value of zero. If the boundarystrength value is assigned a value of zero the boundary is not filteredor otherwise adaptively filtered accordingly to the value of theboundary strength. It is to be understood that the system may lightlyfilter if the boundary strength is zero, if desired.

The value of the boundary strength, namely, one, two, and three, is usedto control the pixel value adaptation range in the loop filter. Ifdesired, each different boundary strength may be the basis of adifferent filtering. For example, in some embodiments, three kinds offilters may be used wherein a first filter is used when Bs=1, a secondfilter is used when Bs=2 and a third filter is used when Bs=3. It is tobe understood that non-filtering may be performed by minimal filteringin comparison to other filtering which results in a more significantdifference. In the example shown in FIG. 8 the larger the value for Bsthe greater the filtering. The filtering may be performed by anysuitable technique, such as methods described in Joint Committee Draft(CD) of the Joint Video Team (NT) of ISO/IEC MPEG and ITU-T VCEG(JVT-C167) or other known methods for filtering image artifacts.

Skip mode filtering can be used with any system that encodes or decodesmultiple image frames. For example, DVD players, video recorders, or anysystem that transmits image data over a communications channel, such asover television channels or over the Internet. It is to be understoodthat the system may use the quantization parameter as a codingparameter, either alone or in combination with other coding parameters.In addition, it is to be understood that the system may be free fromusing the quantization parameter alone or free from using thequantization parameter at all for purposes of filtering.

The skip mode filtering described above can be implemented withdedicated processor systems, micro controllers, programmable logicdevices, or microprocessors that perform some or all of the operations.Some of the operations described above may be implemented in softwareand other operations may be implemented in hardware.

For the sake of convenience, the operations are described as variousinterconnected functional blocks or distinct software modules. This isnot necessary, however, and there may be cases where these functionalblocks or modules are equivalently aggregated into a single logicdevice, program or operation with unclear boundaries. In any event, thefunctional blocks and software modules or described features can beimplemented by themselves, or in combination with other operations ineither hardware or software.

In some embodiments of the present invention as illustrated in FIG. 9,image data 902 may be input to an image data encoding apparatus 904which includes the adaptive filtering portion as described above forsome embodiments of the present invention. Output from the image dataencoding apparatus 904 is an encoded image data and may then be storedon any computer-readable storage media 906. The storage media mayinclude, but is not limited to, disc media, memory card media, ordigital tape media. Storage media 906 may act as a short-term storagedevice. The encoded image data may be read from storage media 906 anddecoded by an image data decoding apparatus 908 which includes theadaptive filtering portion as described above for some embodiments ofthe present invention. The decoded image data may be provided for outputdecoded image data 910 to a display or other device.

In some embodiments of the present invention, as illustrated in FIG. 10image data 1002 may be encoded and the encoded image data may then bestored on storage media 1006 and image data decoding apparatus 1008 isthe same as shown in FIG. 9. In FIG. 10, B's data encoding portion 1012receives the value of the boundary strength B's for each block boundaryand encoded by any data encoding method which includes DPCM, multi-valuerun-length coding, transform coding with loss-less feature and so on.The boundary strength B's may be generated as described in FIG. 8. Theencoded boundary strength may then be stored on storage media 1006. Inone example, the encoded boundary strength may be stored separately fromthe encoded image data. In other example, the encoded boundary strengthand the encoded image data may be multiplexed before storing on thestorage media 1006.

The encoded boundary strength may be read from the storage media 1006and decoded by B's data decoding portion 1014 to input the decodedboundary strength to image data decoding apparatus 1008. When thedecoded boundary strength is utilized in image data decoding apparatus1008 to perform the adaptive filtering of the present invention, it maynot be necessary to repeat the process described in FIG. 8 to generateboundary strength and this may save the processing power for theadaptive filtering.

In some embodiments of the present invention, as illustrated in FIG. 11,image data 1102 may be input to an image data encoding apparatus 1104which includes the adaptive filtering portion as described above forsome embodiments of the present invention. Output from the image dataencoding apparatus 1104 is an encoded image data and may then be sentover a network, such as a LAN, WAN or the Internet 1106. The encodedimage data may be received and decoded by an image decoding apparatus1108 which also communicates with network 1106. The image data decodingapparatus 1108 includes the adaptive filtering portion as describedabove for some embodiments of the present invention. The decoded imagedata may be provided for output decoded image data 1110 to a display orother device.

In some embodiments of the present invention, as illustrated in FIG. 12,image data 1202 may be encoded and the encoded image data may then besent over a network, such as a LAN, WAN or the Internet 1206. The basicprocedure of image data encoding apparatus 1204 and image data decodingapparatus 1208 is the same as FIG. 11. In FIG. 12, B's data encodingportion 1212 receives the value of the boundary strength B's for eachblock and encoded by any date encoding method which includes DPCM,multi-value run-length coding, transform coding with loss-less featuresand so on. The boundary strength B's may be generated as described inFIG. 11. The encoded boundary strength may then be sent over the network1206. In one example, the encoded boundary strength may be sentseparately from the encoded image data. In other examples, the encodedboundary strength and the encoded image data may be multiplexed beforesending over the network 1206.

The encoded boundary strength may be received from the network 1206 anddecoded by B's data decoding portion 1214 to input the decoded boundarystrength to image data decoding apparatus 1208 to perform the adaptivefiltering of the present invention, it may not be necessary to repeatthe process described in FIG. 11 to generate boundary strength and thismay save the processing power for the adaptive filtering.

Some embodiments of the present invention may be described withreference to FIG. 13. In these systems and methods, adjacent blocks 150in a video frame are identified and coding parameters for these adjacentblocks are identified. The coding parameters for the adjacent blocks arethen compared to determine their similarity 154. When the codingparameters are not similar, a deblock filter 156 is applied along theboundary between the adjacent blocks. When the coding parameters aresimilar, deblock filtering is skipped and the process proceeds to thenext step 158. Likewise, when deblock filtering is performed, theprocess proceeds to the next step 158 after filtering.

In some embodiments of the present invention, as shown in FIG. 14, thecoding parameters are motion vectors. In these embodiments, adjacentblocks in a video frame are identified 160 and coding parameters 162comprising motion vectors are identified. These motion vectors arecompared to determine their similarity 164. When the motion vectors arenot similar, deblock filtering may be performed 166 between the adjacentblocks and the process may proceed to its next step 168. When the motionvectors are similar, deblock filtering is skipped and the next step 168is accomplished directly.

Other embodiments of the present invention, as shown in FIG. 15, may usemultiple coding parameters to determine whether to skip filtering. Inthese embodiments, adjacent blocks are identified 170 and codingparameters 172 are determined for the adjacent blocks. These codingparameters may comprise motion vector attributes including the targetframe of the motion vectors. When motion vectors of adjacent blocks arenot similar 174, deblock filtering may be performed 176 between theadjacent blocks. When motion vectors are similar 174, other parametersmay be used to further qualify the filtering process. In this example,the motion vectors may be compared to determine whether they point tothe same reference frame 178. If the vectors do not point to the samereference frame, deblock filtering may be performed between the blocks176. If the vectors point to the same reference frame, filtering may beskipped and the process may proceed to the next step 179.

Further motion vector parameters may be used to determine filtering. Inembodiments illustrated in FIG. 16, the location of the blocks to whichvectors point is a parameter that may be used to determine filteringoptions. In these embodiments, adjacent blocks are identified 200 andcoding parameters are identified for the adjacent blocks 202. Motionvectors are then compared to determine their similarity 204. If thevectors are not similar, deblock filtering may proceed 208. If motionvectors are similar, another comparison may be made to determine whetherthe motion vectors of the adjacent blocks point to the same referenceframe. If the vectors don't point to the same frame, deblock filteringmay proceed 208. If the vectors do point to the same reference frame,the blocks to which the vectors point may be compared 210. When motionvectors do not point to adjacent blocks in the same reference frame,deblock filtering may proceed 208. When the vectors point to adjacentblocks in the same reference frame, deblock filtering may be skipped anda next step 212 may be executed. In this manner, adjacent blocks whichreference adjacent blocks in a reference frame and which are not likelyto have significant artifacts therebetween are not deblock filtered.This deblock filter skipping avoids any blurring and image degradationcaused by the filtering process. Processing time is also conserved asunnecessary filtering is avoided. Image quality is thereby improved andfewer calculations are required in the process. It should be noted thatvarious combinations of these motion vector parameters may be used todetermine filter skipping. These myriad combinations are notspecifically described in detail, but are thought to be within the graspof one skilled in the art and are intended to fall within the scope ofthe appended claims.

Further embodiments of the present invention may utilize transformcoefficients to determine whether deblock filtering should occur. Inreference to FIG. 17, adjacent blocks 180 in a frame are identified andcoding parameters are identified for the adjacent blocks 182. Thesecoding parameters may comprise motion vector parameters as well astransform coefficients.

Motion vectors are then compared 184 to determine similarity. If themotion vectors are not similar, deblock filtering may be performed 186.If the motion vectors are similar, the motion vector data is analyzed todetermine whether the motion vectors point to the same reference frame.If the motion vectors do not point to the same reference frame 185,filtering may proceed 186.

If the motion vectors point to the same reference frame 185, transformcoefficients may be compared to further qualify filtering processes. Inthis example, DC transform coefficients obtained through Discrete CosineTransform (DCT) methods or other methods may be compared for theadjacent blocks. If the DC transform coefficients are not similar 187,deblock filtering may be performed 186. If the DC transform coefficientsare similar, filtering may be skipped and the methods and systems mayproceed to the next step 188.

Still other embodiments of the present invention may utilize ACtransform coefficients to determine filtering options. In reference toFIG. 18, embodiments similar to those described in relation to FIG. 17are illustrated with the additional steps of evaluating AC transformcoefficients. In these embodiments, blocks 190 and their codingparameters 191 are identified. Similarities in motion vectors 192,motion vector target frames 193 and DC transform coefficients are alsocompared 194. When similarities in these parameters exist, AC transformcoefficients are compared 196 and, if they are similar, deblockfiltering is skipped and the next step in the process is executed 197.If the AC coefficients are not similar, filtering is performed betweenthe adjacent blocks and the process proceeds on to the next step 197.

AC transform coefficients are more likely to have significance in largerblocks, but can be used in methods utilizing smaller blocks such as 4×4blocks.

In some embodiments of the present invention, an image may be separatedinto various luminance and chrominance channels depending on the formatof the image and the color space utilized. In the following examples, aYUV color space is described, however, many other formats and colorspaces may be used in these embodiments. CieLAB, YcrCb and other spacesmay be used. In alternative embodiments color spaces such as RGB may beused.

Some embodiments of the present invention may be described in relationto FIG. 19. In these embodiments, luminance data is extracted from theimage and a luminance image is created 220. Adjacent blocks are thenidentified in the luminance image 222 and coding parameters for theadjacent blocks are also identified 224. As in other embodiments, themotion vectors of the adjacent blocks are compared to determinesimilarities 226. When the motion vectors are not similar, deblockfiltering is performed 230, when the vectors are similar furtheranalysis is performed to determine whether the vectors point to the samereference frame 228. When the vectors point to different referenceframes, deblock filtering is performed between the adjacent blocks 230of the original image that correspond to the adjacent blocks in theluminance image. When the vectors point to the same frame, deblockfiltering is skipped and the next step is executed without priorfiltering 232. When filtering is performed, the next step is executed232 after the filtering processes. Accordingly, analysis of data in theluminance channel is used to determine filtering processes in theoriginal image, which contains both luminance and chrominance data.

In other related embodiments, illustrated in FIG. 20, a luminance imageis created 240 and corresponding adjacent blocks are identified in theluminance and original image 242. Coding parameters are also identifiedfor the luminance image blocks 244. Subsequently, motion vectors arecompared to determine similarities 246. If significant similarities donot exist, filtering is performed between the adjacent blocks in theoriginal image 252. If motion vectors are similar, the target frames ofthe motion vectors are compared to determine whether the vectors pointto the same reference frame. If the vectors do not point to the samereference frame, filtering is performed. If the vectors point to thesame reference frame, transform coefficients of the luminance (Y) imageare compared. If Y transform coefficients are not similar, filtering isperformed. If transform coefficients are similar, filtering is skippedand the next step 254 is executed. Likewise, the next step is executed254 after any filtering operation.

Images may be further divided into component channels that generallycorrespond to luminance and chrominance channels. In some embodiments ofthe present invention, each channel may be filtered according toparameters unique to that channel.

As an example, embodiments may be described with reference to FIG. 21,wherein an image is divided into separate luminance (Y) and multiplechrominance (U, V) channels 260. In these embodiments adjacent blocksare identified in images corresponding to each channel 262, 272, 282.Coding parameters, such as motion vectors data, are also identified forthese blocks in each channel 264, 274, 284. These coding parameters maythen be compared to determine similarities as in other embodiments. Inthese exemplary embodiments, motion vector similarities forchannel-specific motion vectors may be used to determine filteringoptions in each channel. When the motion vectors for a channel image arenot similar 266, 276, 286, filtering is performed in that specificchannel between the adjacent blocks 270, 280, 290. If the motion vectorsare similar, the target reference frames are compared 268, 278, 288.When the vectors for adjacent blocks in a channel point to the samereference frame, filtering is skipped. When the vectors point todifferent reference frames filtering is performed 270, 280, 290.

As in other embodiments, these channelized embodiments may utilizetransform coefficient data to qualify filtering options. As shown inFIG. 22, the methods and systems described in relation to FIG. 21 mayfurther compare channel transform coefficients 310, 322, 334. When thecoefficients are not similar, filtering is performed 312, 324, 336. Whenthe coefficients are similar, filtering is skipped.

It should be noted that various combinations of parameters may beemployed in qualifying filtering operations in each channel. DC and ACtransform coefficients may be utilized for these embodiments.Furthermore, various channels and combinations of channels may be usedto determine filtering options and perform filtering. For example, bothchrominance channels may be combined and analyzed together in someembodiments. Data and parameters from one channel may also be used todetermine filtering options in another channel. For example, parameterstaken from the U chrominance channel may be compared to determinefiltering options in the V chrominance channel and vice versa.

Some embodiments of the present invention relate to the Scalable VideoCoding Extension of H.264/AVC. Some embodiments relate to filtering toaddress a problem of picture upsampling for spatial scalable videocoding. More specifically, some embodiments of the present inventionprovide an upsampling procedure that is designed for the Scalable VideoCoding extension of H.264/MPEG-4 AVC, especially for the ExtendedSpatial Scalable (ESS) video coding feature adopted in April 2005 by JVT(Joint Video Team of MPEG and VCEG).

Currently, JSVM WD-1.0 [MPEG Doc. N6901], which is incorporated byreference herein, only addresses dyadic spatial scalability, that is,configurations where the ratio between picture width and height (interms of number of pixels) of two successive spatial layers equals 2.This obviously will be a limitation on more general applications, suchas SD to HD scalability for broadcasting.

A tool has been proposed,[MPEG Doc. m11669], which is incorporated byreference herein, that provides extended spatial scalability, that is,managing configurations in which the ratio between picture width andheight of two successive spatial layers is not necessarily equal to apower of 2 and pictures of a higher level can contain regions (typicallyaround picture borders) that are not present in corresponding picturesof a lower level. This proposal [MPEG Doc. m11669] extended inter-layerprediction of WD-1.0 [MPEG Doc. N6901] for more generic cases where theratio between the higher layer and lower layer picture dimensions is nota power of 2.

Embodiments of the present invention provide a method that applies theextended spatial scalability, i.e., non-dyadic scaling with croppingwindow, to picture level that will better fit the need of more generalapplications. To support the picture-level adaptation of spatialscalability, embodiments of the present invention provide a furtherrefinement of the inter-layer prediction method heretofore proposed.Additionally, several issues that were not addressed by the priorproposal are also addressed in these embodiments.

For the purposes of this specification and claims, the term “picture”may comprise an array of pixels, a digital image, a subdivision of adigital image, a data channel of a digital image or anotherrepresentation of image data.

FIG. 23 shows two pictures corresponding to an image picture.

Embodiments of the present invention relate to two or more successivespatial layers, a lower layer (considered as base layer) 253 and ahigher layer (considered as enhancement layer) 251. These layers may belinked by the following geometrical relations (shown in FIG. 1). Width250 and height 252 of enhancement layer pictures may be defined asw_(enh) and h_(enh), respectively. In the same way, dimensions of a baselayer picture may be defined as w_(base) 254 and h_(base) 256. The baselayer 253 may be a subsampled 264 version of a sub-region of anenhancement layer picture 251, of dimensions w_(extract) 258 andh_(extract) 260, positioned at coordinates 262 (x_(orig), y_(orig)) inthe enhancement layer picture coordinate system. Parameters (x_(orig),y_(orig), w_(extract), h_(extract), w_(base), h_(base)) define thegeometrical relations between a higher layer picture 251 and a lowerlayer picture 253.

A problem addressed by embodiments of the present invention is theencoding/decoding of macroblocks of the enhancement layer knowing thedecoded base layer. A macroblock of an enhancement layer may have eitherno base layer corresponding block (on borders of the enhancement layerpicture) or one to several base layer corresponding macroblocks, asillustrated in FIG. 24. Consequently, a different managing of the interlayer prediction than in WD-1.0 [MPEG Doc. N6901] is necessary. FIG. 2illustrates macroblock overlapping between an upsampled base layerpicture 272, wherein macroblock boundaries are marked by dashed lines274 and an enhancement layer picture 270, wherein macroblock boundariesare marked by solid lines 276.

It has been proposed that [MPEG Doc. m11669], w_(extract) andh_(extract) be constrained to be a multiple of 16. This constraintlimits the picture-level adaptation. Instead, embodiments of the presentinvention restrict w_(extract) and h_(extract) to be a multiple of 2.Embodiments of the present invention may further require x_(orig) andy_(orig) to be a multiple of 2 in order to avoid the complexity inadjusting for possible phase shift in chroma up/down sampling. Thechromaphase shift problem has not been previously addressed.

The dimensions and other parameters illustrated in FIG. 23 may berepresented by the following symbols or variable names.

scaled_base_left_offset=x_(orig)

scaled_base_top_offset=y_(orig)

scaled_base_right_offset=w_(enh)−x_(orig)−w_(extract)

scaled_base_bottom_offset=h_(enh)−y_(orig)−h_(extract)

scaled_base_width=W_(extract)

scaled_base_height=h_(extract)

Inter-Layer Motion Prediction

A given high layer macroblock can exploit inter-layer prediction usingscaled base layer motion data using either “BASE_LAYER_MODE” or“QPEL_REFINEMENT_MODE”. As in WD-1.0 [MPEG Doc. N6901], these macroblockmodes indicate that the motion/prediction information includingmacroblock partitioning is directly derived from the base layer. Aprediction macroblock, MB_pred, can be constructed by inheriting motiondata from a base layer. When using “BASE_LAYER_MODE”, the macroblockpartitioning, as well as the reference indices and motion vectors, arethose of the prediction macroblock MB_pred. “QPEL_REFINEMENT_MODE” issimilar, but with a quarter-sample motion vector refinement.

It has been proposed to derive MB_pred in the following four steps:

for each 4'4 block of MB_pred, inheritance of motion data from the baselayer motion data,

partitioning choice for each 8×8 block of MB_pred,

mode choice for MB_pred, and

motion vector scaling.

However, embodiments of the present invention provide modifications inseveral equations to support picture-level adaptation.

4×4 Block Inheritance

FIG. 25 illustrates a 4×4 block b 280 with four corners 281, 282, 283and 284. The process consists of checking each of the four corners ofthe block 281, 282, 283 and 284. Let (x, y) be the position of a cornerpixel c in the high layer coordinate system. Let (x_(base), y_(base)) bethe corresponding position in the base layer coordinate system, definedas follows:

$\begin{matrix}\{ \begin{matrix}{x_{base} = \frac{\lbrack {{( {x - x_{orig}} ) \cdot w_{base}} + {w_{extract}/2}} \rbrack}{w_{extract}}} \\{y_{base} = \frac{\lbrack {{( {y - y_{orig}} ) \cdot h_{base}} + {h_{extract}/2}} \rbrack}{h_{extract}}}\end{matrix}  & (1)\end{matrix}$

The co-located macroblock of pixel (x, y) is then the base layermacroblock that contains pixel (x_(base), y_(base)). In the same way,the co-located 8×8 block of pixel (x, y) is the base layer 8×8 blockcontaining pixel (x_(base), y_(base)) and the co-located 4×4 block ofpixel (x, y) is the base layer 4×4 block containing pixel (x_(base),y_(base)).

The motion data inheritance process for b may be described as follows:

for each corner c, the reference index r(c,listx) and motion vectormv(c,listx) of each list listx (listx=list0 or list1) are set to thoseof the co-located base layer 4×4 block

for each corner, if the co-located macroblock does not exist or is inintra mode, then b is set as an intra block

else, for each list listx

-   -   if none of the corners uses this list, no reference index and        motion vector for this list is set to b    -   else        -   the reference index r_(b)(listx) set for b is the minimum of            the existing reference indices of the 4 corners:

$\begin{matrix}{{r_{b}({listx})} = {\min\limits_{c}( {r( {c,{listx}} )} )}} & (2)\end{matrix}$

-   -   -   the motion vector mv_(b)(listx) set for b is the mean of            existing motion vectors of the 4 corners, having the            reference index r_(b)(listx).

8×8 Partitioning Choice

Once each 4×4 block motion data has been set, a merging process isnecessary in order to determine the actual partitioning of the 8×8 blockit belongs to and to avoid forbidden configurations. In the following,4×4 blocks of an 8×8 block are identified as indicated in FIG. 26.

For each 8×8 block B, the following process may be applied:

if the 4 4×4 blocks have been classified as intra blocks, B isconsidered as an intra block.

else, B partitioning choice is achieved:

-   -   The following process for assigning the same reference indices        to each 4×4 block is applied: for each list listx        -   if no 4×4 block uses this list, no reference index and            motion vector of this list are set to B        -   else            -   reference index r_(B)(listx) for B is computed as the                minimum of the existing reference indices of the 4 4×4                blocks:

$\begin{matrix}{{r_{B}({listx})} = {\min\limits_{b}( {r_{b}({listx})} )}} & (3)\end{matrix}$

-   -   -   -   mean motion vector mv_(mean)(listx) of the 4×4 blocks                having the same reference index r_(B)(listx) is computed            -   4×4 blocks (1) classified as intra blocks or (2) not                using this list or (3) having a reference index                r_(b)(listx) different from r_(B)(listx) are enforced to                have r_(B)(listx) and mv_(mean)(listx) as reference                index and motion vector.

    -   Then the choice of the partitioning mode for B is achieved. Two        4×4 blocks are considered as identical if their motion vectors        are identical. The merging process is applied as follows:        -   if b₁ is identical to b₂ and b₃ is identical to b₄ then            -   if b₁ is identical to b₃ then BLK_(—)8×8 is chosen            -   else BLK_(—)8×4 is chosen        -   else if b₁ is identical to b₃ and b₂ is identical to b₄ then            BLK_(—)4×8 is chosen        -   else BLK_(—)4×4 is chosen

Prediction Macroblock Mode Choice

In some embodiments, a process may be achieved to determine an MB_predmode. In the following, 8×8 blocks 301-304 of the macroblock 300 areidentified as indicated in FIG. 27.

Two 8×8 blocks are considered as identical blocks if:

One or both of the two 8×8 blocks are classified as intra blocks or

Partitioning mode of both blocks is BLK_(—)8×8 and reference indices andmotion vectors of list0 and list1 of each 8×8 block, if they exist, areidentical.

The mode choice is done using the following process:

if all 8×8 blocks are classified as intra blocks, then MB_pred isclassified as INTRA macroblock

else, MB_pred is an INTER macroblock. Its mode choice is achieved asfollows:

-   -   8×8 blocks classified as intra are enforced to BLK_(—)8×8        partitioning. Their reference indices and motion vectors are        computed as follows. Let B_(INTRA) be such a 8×8 block.    -   for each list listx        -   if no 8×8 block uses this list, no reference index and            motion vector of this list is assigned to B_(INTRA)        -   else, the following steps are applied:            -   a reference index r_(min)(listx) is computed as the                minimum of the existing reference indices of the 8×8                blocks:

$\begin{matrix}{{r_{\min}({listx})} = {\min\limits_{B}( {r_{B}({listx})} )}} & (4)\end{matrix}$

-   -   -   -   -   a mean motion vector mv_(mean)(listx) of the 4×4                    blocks having the same reference index                    r_(min)(listx) is computed                -   r_(min)(listx) is assigned to B_(INTRA) and each 4×4                    block of B_(INTRA) is enforced to have                    r_(min)(listx) and mv_(mean)(listx) as reference                    index and motion vector.

    -   Then the choice of the partitioning mode for B is achieved. Two        8×8 blocks are considered as identical if their Partitioning        mode is BLK_(—)8×8 and reference indices and motion vectors of        list0 and list1 of each 8×8 block, if they exist, are identical.        The merging process is applied as follows:        -   if B1 is identical to B2 and B3 is identical to B4 then            -   if B1 is identical to B3 then MODE_(—)16×16 is chosen.            -   else MODE_(—)16×8 is chosen.        -   else if B1 is identical to B3 and B2 is identical to B4 then            MODE_(—)8×16 is chosen.        -   else MODE_(—)8×8 is chosen.

Motion Vectors Scaling

A motion vector rescaling may be applied to every existing motion vectorof the prediction macroblock MB_pred as derived above. A Motion vectormv=(d_(x), d_(y)) may be scaled in the vector mv_(s)=(d_(sx), d_(sy))using the following equations:

$\begin{matrix}\{ \begin{matrix}{d_{sx} = {\frac{( {{d_{x} \cdot w_{extract}} + {{{sign}\lbrack d_{x} \rbrack} \cdot {w_{base}/2}}} )}{w_{base}} + {4 \cdot ( {x_{{orig},r} - x_{orig}} )}}} \\{d_{sy} = {\frac{( {{d_{y} \cdot h_{extract}} + {{{sign}\lbrack d_{y} \rbrack} \cdot {h_{base}/2}}} )}{h_{base}} + {4 \cdot ( {y_{{orig},r} - y_{orig}} )}}}\end{matrix}  & (5)\end{matrix}$

in which sign[x] is equal to 1 when x is positive, (−1) when x isnegative, and 0 when x equals 0. The symbols with subscript “r”represent the geometrical parameters of the corresponding referencepicture.

Inter-Layer Texture Prediction Texture Upsampling

In some embodiments of the present invention, inter layer textureprediction may be based on the same principles as inter layer motionprediction. Base layer texture upsampling may be achieved applying thetwo-lobed or three-lobed Lanczos-windowed sinc functions. These filtersare considered to offer the best compromise in terms of reduction ofaliasing, sharpness, and minimal ringing. The two-lobed Lanczos-windowedsinc function may be defined as follows:

$\begin{matrix}{{{Lanczos}\; 2(x)} = \{ \begin{matrix}{{\frac{\sin ( {\pi \; x} )}{\pi \; x}\frac{\sin ( {\pi \frac{x}{2}} )}{\pi \frac{x}{2}}},} & {{x} < 2} \\{0,} & {{x} \geq 2}\end{matrix} } & (6)\end{matrix}$

This upsampling step may be processed either on the full frame or blockby block. For Intra texture prediction, repetitive padding is used atframe boundaries. For residual prediction, repetitive padding is used atblock boundaries (4×4 or 8×8 depending on the transform).

In an exemplary embodiment, according to the Lanczos2 function, thefollowing 16 4-tap upsampling filters are defined in Table 1 below forthe 16 different interpolation phases in units of one-sixteenth samplespacing relative to the sample grid of corresponding component in thebase layer picture.

For a luma sample in the current layer at position (x, y), the phaseshift relative to the corresponding samples in the base layer pictureshall be derived as:

$\begin{matrix}\{ \begin{matrix}{p_{x,L} = {\frac{\lbrack {( {x - x_{orig}} ) \cdot w_{base} \cdot 16} \rbrack}{w_{extract}} - {16 \cdot \lbrack \frac{( {x - x_{orig}} ) \cdot w_{base}}{w_{extract}} \rbrack}}} \\{p_{y,L} = {\frac{\lbrack {( {y - y_{orig}} ) \cdot h_{base} \cdot 16} \rbrack}{h_{extract}} - {16 \cdot \lbrack \frac{( {y - y_{orig}} ) \cdot h_{base}}{h_{extract}} \rbrack}}}\end{matrix}  & (7)\end{matrix}$

TABLE 1 4-tap interpolation filters for upsampling (4-tap) interpolationfilter coefficients phase e[−1] e[0] e[1] e[2] 0 0 128 0 0 1 −4 127 5 02 −8 124 13 −1 3 −10 118 21 −1 4 −11 111 30 −2 5 −11 103 40 −4 6 −10 9350 −5 7 −9 82 61 −6 8 −8 72 72 −8 9 −6 61 82 −9 10 −5 50 93 −10 11 −4 40103 −11 12 −2 30 111 −11 13 −1 21 118 −10 14 −1 13 124 −8 15 0 5 127 −4

For a chroma sample in the current layer at position (x_(c), y_(c)) inthe chroma sample coordinate system, the phase shift relative to thecorresponding samples in the base layer picture may be derived as:

$\begin{matrix}\{ \begin{matrix}{p_{x,c} = {\frac{\lbrack {( {x_{c} - x_{{orig},c}} ) \cdot w_{{base},c} \cdot 16} \rbrack}{w_{{extract},c}} - {16 \cdot \lbrack \frac{( {x_{c} - x_{{orig},c}} ) \cdot w_{{base},c}}{w_{{extract},c}} \rbrack}}} \\{p_{y,c} = {\frac{\lbrack {( {y_{c} - y_{{orig},c}} ) \cdot h_{{base},c} \cdot 16} \rbrack}{h_{{extract},c}} - {16 \cdot \lbrack \frac{( {y_{c} - y_{{orig},c}} ) \cdot h_{{base},c}}{h_{{extract},c}} \rbrack}}}\end{matrix}  & (8)\end{matrix}$

in which

w _(base,c) =w _(base)·BasePicMbWidthC/16  (9)

w _(extract,c) =w _(extract)·MbWidthC/16  (10)

h _(base,c) =h _(base)·BasePicMbHeightC/16  (11)

h _(extract,c) =h _(extract)·MbHeightC/16  (12)

x _(orig,c) =x _(orig)·MbWidthC/16  (13)

pi y_(orig,c) =y _(orig)·MbHeightC/16

According to each phase shift derived, a 4-tap filter can be chosen fromTable 1 for interpolation.

Inter-Layer Intra Texture Prediction

In WD-1.0 [MPEG Doc. N6901], the I_BL mode requires all thecorresponding base-layer macroblocks to be intra-coded. In embodimentsof the present invention the requirement may be relaxed to allow thatthe corresponding base-layer macroblocks be inter-coded or not-existing.

For generating the intra prediction signal for macroblocks coded in I_BLmode, the co-located blocks (if any) of the base layer signals aredirectly de-blocked and interpolated. For 4 input samples (X[n−1], X[n],X[n+1], X[n+2]), the output value Y of a 4-tap interpolation filtershall be derived as:

Y=Clip1_(Y)((e[−1]X[n−1]+e[0]X[n]+e[1]X[n+1]+e[2]X[n+2]+64)/128)  (15)

with

Clip1_(Y)(x)=min(max (0, x), (1<<BitDepth_(Y))−1)

in which BitDepth_(Y) represents the bit depth of the luma channel data,for luma sample, or

Y=Clip1_(C)((e[−1]X[n−1]+e[0]X[n]+e[1]X[n+1]+e[2]X[n+2]+64)/128)  (16)

with

Clip1_(C)(x)=min(max (0, x), (1<<BitDepth)−1)

in which BitDepth_(C) represents the bit depth of the chroma channeldata, for Chroma sample.

Because rounding operations are applied in Equations 15 and 16, thefiltering order may be specified as horizontally first or verticallyfirst. It is recommended that filter operations are performed in thehorizontal direction first and then followed by filter operations in thevertical direction. This upsampling process is invoked only whenextended_spatial_scalability, defined below, is enabled.

After the upsampling filter operation, constant values shall be used tofill the image regions outside of the cropping window. The constantshall be (1<<(BitDepth_(Y) _(—)1)) for luma or (1<<(BitDepth_(C)−1)) forchroma.

Inter-Layer Residual Prediction

Similar to Inter-Layer Intra Texture Prediction, the same 4-tap filters,or other filters, may be applied when upsampling the base layerresiduals, but with different rounding and clipping functions from thatin Equations 15 and 16.

For 4 input residual samples (X[n−1], X[n], X[n+1], X[n+2]), the outputvalue Y of the filter shall be derived as:

Y=Clip1_(Y,r)(e[−1]X[n−1]+e[0]X[n]+e[1]X[n+1]+e[2]X[n+2])/128)  (17)

for luma residual sample, or

Y=Clip1_(C,r)((e[−1]X[n−1]+e[0]X[n]+e[1]X[n+1]+e[2]X[n+2])/128)  (18)

for Chroma residual sample.

The clipping functions for residual upsampling are defined as:

Clip1_(Y,r)(x)=Clip3(1−(1<<BitDepth_(Y)), (1<<BitDepth_(Y))−1,x)  (19)

Clip1_(C,r)(x)=Clip3(1−(1<<BitDepth_(C)), (1<<BitDepth_(C))−1,x)  (20)

where Clip3(a, b, x)=min(max(a,x), b).

Similarly, after the upsampling filter operation, constant values shallbe used to fill the pixel positions where residual prediction is notavailable, including image regions outside of the cropping window. Theconstant shall be 0 for all color components.

Changes in Syntax and Semantics Syntax in Tabular Form

Embodiments of the present invention may utilize the following changesare indicated below in large bold text. The main changes are theaddition in the sequence parameter set of a symbol,extendeds_spatial_scalability, and accordingly four parameters:

scaled_base_left_offset_divided_by_two,

scaled_base_top_offset_divided_by_two,

scaled_base_right_offset_divided_by_two,

scaled_base_bottom_offset_divided_by_two

in sequence parameter set and slice_data_in_scalable_extension( )related to the geometrical transformation to be applied in the baselayer upsampling process.

Sequence parameter set syntax in scalable extension

De- seq_parameter_set_rbsp( ) { C scriptor . . . . . . . . .extended_spatial_scalability 0 u(2) if( extended_spatial_scalability ==1 ) { scaled_base_left_offset_divided_by_two 0 ue(v)scaled_base_top_offset_divided_by_two 0 ue(v)scaled_base_right_offset_divided_by_two 0 ue(v)scaled_base_bottom_offset_divided_by_two 0 ue(v) } . . . . . . . . .rbsp_trailing_bits( ) 0 }

Slice Data Syntax in Scalable Extension

De- slice_data_in_scalable_extension( ) { C scriptor if(extended_spatial_scalability == 2 ) {scaled_base_left_oftset_divided_by_two 2 ue(v)scaled_base_top_offset_divided_by_two 2 ue(v)scaled_base_right_offset_divided_by_two 2 ue(v)scaled_base_bottom_offset_divided_by_two 2 ue(v) } if(extended_spatial_scalability ) HalfSpatResBaseFlag = 0 elseHalfSpatResBaseFlag = half_spat_res_base_pic( ) . . . . . . . . . }

Macroblock Layer Syntax in Scalable Extension

De- macroblock_layer_in_scalable_extension( ) { C scriptor if(base_id_plus1 != 0 && adaptive_prediction_flag ) { base_mode_flag 2ae(v) if( ! base_mode_flag && (HalfSpatResBaseFlag ∥extended_spatial_scalability) && ! intra_base_mb( CurrMbAddr ) )base_mode_refinement_flag 2 ae(v) } . . . . . . . . . }

Semantics Sequence Parameter Set Syntax in Scalable Extension

extended_spatial_scalability specifies the presence of syntax elementsrelated to geometrical parameters for the base layer upsampling. Whenextended_spatial_scalability is equal to 0, no geometrical parameter ispresent in the bitstream. When extended_spatial_scalability is equal to1, geometrical parameters are present in the sequence parameter set.When extended_spatial_scalability is equal to 2, geometrical parametersare present in slice_data_in_scalable_extension. The value of 3 isreserved for extended_spatial_scalability. Whenextended_spatial_scalability is not present, it shall be inferred to beequal to 0.

scaled_base_left_offset_divided_by_two specifies half of the horizontaloffset between the upper-left pixel of the upsampled base layer pictureand the upper-left pixel of the current picture. Whenscaled_base_left_offset_divided_by_two is not present, it shall beinferred to be equal to 0.

scaled_base_top_offset_divided_by_two specifies half of the verticaloffset of the upper-left pixel of the upsampled base layer picture andthe upper-left pixel of the current picture. Whenscaled_base_top_offset_divided_by_two is not present, it shall beinferred to be equal to 0.

scaled_base_right_offset_divided_by_two specifies half of the horizontaloffset between the bottom-right pixel of the upsampled based layerpicture and the bottom-right pixel of the current picture. Whenscaled_base_right_offset_divided_by_two is not present, it shall beinferred to be equal to 0.

scaled_base_bottom_offset_divided_by_two specifies half of the verticaloffset between the bottom-right pixel of the upsampled based layerpicture and the bottom-right pixel of the current picture. Whenscaled_base_bottom_offset_divided_by_two is not present, it shall beinferred to be equal to 0.

All geometrical parameters are specified as unsigned integer in units ofone-sample spacing relative to the luma sampling grid in the currentlayer. Several additional symbols (scaled_base_left_offset,scaled_base_top_offset, scaled_base_right_offset,scaled_base_bottom_offset, scaled_base_width, scaled_base_height) arethen defined based on the geometrical parameters:

scaled_base_left_offset=2·scaled_base_left_offset_divided_by_two

scaled_base_top_offset=2·scaled_base_top_offset_divided_by_two

scaled_base_right_offset=2·scaled_base_right_offset_divided_by_two

scaled_base_bottom_offset=2·scaled_base_bottom_offset_divided_by_two

scaled_base_width=PicWidthInMbs·16−scaled_base_left_offset−scaled_base_right_offset

scaled_base_height=PicHeightInMapUnits·16−scaled_base_top_offset−scaled_base_bottom_offset

Slice Data Syntax in Scalable Extension

Semantics of the syntax elements in the slice data are identical to thatof the same syntax elements in the sequence parameter set.

Decoding Process Decoding Process for Prediction Data

Compared to WD-1.0 [MPEG Doc. N6901], the following processes must beadded. For each macroblock, the following applies:

If extended_spatial_scalability is equal to 1 or 2 andbase_layer_mode_flag is equal to 1, the motion vector field includingthe macroblock partitioning is derived using the process described inSection 3. As in WD-1.0 [MPEG Doc. N6901], if all correspondingbase-layer macroblocks are intra-coded, the current macroblock mode isset to I_BL.

else, if extended_spatial_scalability is equal to 1 or 2 andbase_layer_mode_flag is equal to 0 but base_layer_refinement is equal to1, the base layer refinement mode is signaled. The base layer refinementmode is similar to the base layer prediction mode. The macroblockpartitioning as well as the reference indices and motion vectors arederived following Section 3. However, for each motion vector aquarter-sample motion vector refinement mvd_ref IX (−1, 0, or +1 foreach motion vector component) is additionally transmitted and added tothe derived motion vectors. The rest of the process is identical as inWD-1.0 [MPEG Doc. N6901].

Decoding Process for Subband Pictures

Compared to WD-1.0 [MPEG Doc. N6901], the following processes must beadded:

If extended_spatial_scalability is equal to 1 or 2, intra predictionsignal for an MB in I_BL mode is generated by the following process.

The collocated base layer blocks/macroblocks are filtered.

The intra prediction signal is generated by interpolating the deblocked.The interpolation is performed using process described in Section 4. Therest of the process is identical as in WD-1.0 [MPEG Doc. N6901].

Otherwise, if extended_spatial_scalability is equal to 1 or 2, andresidual_prediction_flag is equal to 1, the following applies.

-   -   The residual signal of the base layer blocks is upsampled and        added to the residual signal of the current macroblock. The        interpolation is performed using process described in Section 4.

Changes to Loop Filter

When extended_spatial_scalability is equal to 1 or 2, a minor changeshould apply to the loop filter during filter strength decision for ablock in I_BL mode.

If the neighboring block is intra-coded but not in I_BL mode, the Bs is4 (this first part is as same as in WD-1.0 [MPEG Doc. N6901]).

Otherwise, if any of the adjacent blocks has coefficient, the Bs is 2.

Otherwise, if the neighboring block is not in I_BL mode, the Bs is 1.

Otherwise, Bs is 0.

6-Tap Filter Embodiments

Some embodiments of the present invention are designed for use with theScalable Video Coding extension of H.264/MPEG-4 AVC, especially for theExtended Spatial Scalable (ESS) video coding feature adopted in April2005 by JVT (Joint Video Team of MPEG and VCEG).

In the current SVC design, the upsampling process is based on thequarter luma sample interpolation procedure that is specified in H.264for inter prediction. The method inherits two drawbacks when applied tospatial scalable coding: (1) the interpolation resolution is limited toquarter samples, and (2) the half sample interpolation must be performedin order to get to a quarter sample position.

Some embodiments of the present invention remove these drawbacks by (1)finer interpolation resolution, and (2) direct interpolation.Consequently, these embodiments reduce the computational complexitywhile improving the quality of the up-sampled pictures.

The upsampling technique of exemplary embodiments of the presentinvention is based on direct interpolation with 16 6-tap filters. Thefilter selection is according to the interpolation positions or phases,ranging from 0 to 15 in units of one-sixteenth picture samples. The setof filters are designed to be backward compatible with the half sampleinterpolation process of SVC and the half sample luma inter predictionof H.264. Therefore, the technique of these embodiments can be a naturalextension of H.264 from hardware/software implementation point of view.

Conventional spatial scalable video coding systems typically deal withcases in which spatial or resolution scaling-factor is 2 or a power of2. In April 2005, Extended Spatial Scalability was adopted into SVCJoint Scalable Video Model (JSVM) to handle more generic applications inwhich spatial scaling factor is not limited to the power of 2. Theupsampling procedure for inter-layer texture prediction, however, isstill a developing issue. During the JVT meeting in April 2005, adecision was made to temporarily adopt the quarter luma sampleinterpolation process specified in H.264 for texture upsampling.

In these embodiments of the present invention, the same geometricrelationships that were described for the above-described embodiments inrelation to FIG. 23 apply as well.

In above-described embodiments, a set of 16 4-tap upsampling filterswere defined for the 16 different interpolation phases in units ofone-sixteenth sample spacing relative to the integer sample grid ofcorresponding component in the base layer picture. The 4-tap filters,however, are not backward compatible to the earlier H.264 design.Consequently, these embodiments may comprise a new set of 16 6-tapfilters and corresponding filtering procedures. In an exemplaryembodiment, the 6-tap filters described in Table 2 may be used. Inanother exemplary embodiment, the 6-tap filters described in Table 3 maybe used.

TABLE 2 First exemplary 16-phase interpolation filter (6-tap)interpolation filter coefficients phase e[−2] e[−1] e[0] e[1] e[2] e[3]0 0 0 32 0 0 0 1 0 −2 32 2 0 0 2 1 −3 31 4 −1 0 3 1 −4 30 7 −2 0 4 1 −428 9 −2 0 5 1 −5 27 11 −3 1 6 1 −5 25 14 −3 0 7 1 −5 22 17 −4 1 8 1 −520 20 −5 1 9 1 −4 17 22 −5 1 10 0 −3 14 25 −5 1 11 1 −3 11 27 −5 1 12 0−2 9 28 −4 1 13 0 −2 7 30 −4 1 14 0 −1 4 31 −3 1 15 0 0 2 32 −2 0

TABLE 3 Second exemplary 16-phase interpolation filter (6-tap)interpolation filter coefficients phase e[−2] e[−1] e[0] e[1] e[2] e[3]0 0 0 32 0 0 0 1 0 −2 32 2 0 0 2 1 −3 31 4 −1 0 3 1 −4 30 6 −1 0 4 1 −428 9 −2 0 5 1 −4 27 11 −3 0 6 1 −5 25 14 −3 0 7 1 −5 22 17 −4 1 8 1 −520 20 −5 1 9 1 −4 17 22 −5 1 10 0 −3 14 25 −5 1 11 0 −3 11 27 −4 1 12 0−2 9 28 −4 1 13 0 −1 6 30 −4 1 14 0 −1 4 31 −3 1 15 0 0 2 32 −2 0

Given a luma sample position (x, y) in the enhancement picture in unitsof integer luma samples, its corresponding position in the base picture(p_(x,L), p_(y,L)) in units of one-sixteenth luma samples of the basepicture can be derived as

$\quad\begin{matrix}\{ \begin{matrix}{{p_{x,L}(x)} = {\lbrack {{( {x - x_{orig}} ) \cdot w_{base} \cdot R_{L}} + {\frac{R_{L}}{2}( {w_{base} - w_{extract}} )}} \rbrack//w_{extract}}} \\{{p_{y,L}(y)} = {\lbrack {{( {y - y_{orig}} ) \cdot h_{base} \cdot R_{L}} + {\frac{R_{L}}{2}( {h_{base} - h_{extract}} )}} \rbrack//h_{extract}}}\end{matrix}  & (21)\end{matrix}$

in which R_(L)=16 (for one-sixteenth-sample resolution interpolation),as in FIG. 23 (x_(orig), y_(orig)) represents the position of theupper-left corner of the cropping window in the current picture in unitsof single luma samples of current picture, (w_(base), h_(base)) is theresolution of the base picture in units of single luma samples of thebase picture, (w_(extract), h_(extract)) is the resolution of thecropping window in units of the single luma samples of current picture,and “II” represents a simplified division operator.

Similarly, given a chroma sample position (x_(c), y_(c)) in theenhancement picture in units of single chroma samples, its correspondingposition in the base picture (p_(x,c), p_(y,c)) in units ofone-sixteenth chroma samples of the base picture can be derived as

$\quad\begin{matrix}\{ \begin{matrix}{{p_{x,c}( x_{c} )} = {\begin{bmatrix}{{( {x_{c} - x_{{orig},c}} ) \cdot w_{{base},c} \cdot R_{C}} +} \\{{\frac{R_{C}}{4}( {2 + p_{{enh},x}} )w_{{base},c}} - {\frac{R_{C}}{4}( {2 + p_{{base},x}} )w_{{extract},c}}}\end{bmatrix}//w_{{extract},c}}} \\{{p_{y,c}( y_{c} )} = {\begin{bmatrix}{{( {y_{c} - y_{{orig},c}} ) \cdot h_{{base},c} \cdot R_{C}} +} \\{{\frac{R_{C}}{4}( {2 + p_{{enh},y}} )h_{{base},c}} - {\frac{R_{C}}{4}( {2 + p_{{base},y}} )h_{{extract},c}}}\end{bmatrix}//h_{{extract},c}}}\end{matrix}  & (22)\end{matrix}$

in which R_(C)=16, (x_(orig,c), Y_(orig,c)) represents the position ofthe upper-left corner of the cropping window in the current picture inunits of single chroma samples of current picture, (w_(base,c),h_(base,c)) is the resolution of the base picture in units of singlechroma samples of the base picture, (w_(extract,c), h_(extract,c)) isthe resolution of the cropping window in units of the single chromasamples of current picture, (p_(based,x), p_(base,y)) represents therelative chroma phase shift of the base picture in units of quarterchroma samples of the base picture, and (p_(enh,x), p_(enh,y))represents the relative chroma phase shift of the current picture inunits of quarter chroma samples of the current picture.

A 6-tap filter can be selected from Table 2 or Table 3 based on theinterpolation positions derived by Eqs. 21 and 22. In some embodiments,when the interpolation position is a half sample position, the filter isas same as that in H.264 defined for half luma sample interpolation.Therefore, the similar hardware/software modules can be applied for thetechnique of these embodiments of the present invention.

For inter-layer residual upsampling, similar direct interpolationmethods can be used, however, with the bilinear interpolation filtersinstead of the 6-tap filters for texture upsampling or the 4-tap filtersdescribed above.

In some exemplary embodiments, an interpolation process is as follows.

1. Define position (xP, yP) for the upper-left luma sample of amacroblock in the enhancement picture. When chroma_format_idc is notequal to 0, i.e., the chroma channels exist, define position (xC, yC)for the upper-left chroma samples of the same macroblock.

2. Derive the relative location of the macroblock in the base-layerpicture,

$\begin{matrix}\{ \begin{matrix}{{xB} = {{p_{x,L}({xP})}4}} \\{{yB} = {{p_{y,L}({yP})}4}}\end{matrix}  & (23) \\\{ \begin{matrix}{{{xB}\; 1} = {( {{p_{x,L}( {{xP} + 15} )} + 15} )4}} \\{{{yB}\; 1} = {( {{p_{y,L}( {{yP} + 15} )} + 15} )4}}\end{matrix}  & (24)\end{matrix}$

and when chroma_format_idc is not equal to 0,

$\begin{matrix}\{ \begin{matrix}{{xCB} = {{p_{x,C}({xC})}4}} \\{{yCB} = {{p_{y,C}({yC})}4}}\end{matrix}  & (25) \\\{ \begin{matrix}{{{xCB}\; 1} = {( {{p_{x,C}( {{xC} + {MbWidthC} - 1} )} + 15} )4}} \\{{{yCB}\; 1} = {( {{p_{y,C}( {{yC} + {MbHeightC} - 1} )} + 15} )4}}\end{matrix}  & (26)\end{matrix}$

in which MbWidthC and MbHeightC represent the number of chroma samplesper MB in horizontal and vertical directions, respectively.

3. Texture Interpolation Process

Inputs to this Process Include

integer luma sample positions in base picture (xB, yB) and (xB1, yB1)

a luma sample array for the base picture based [x, y] with x=−2+xB . . .(xB1+2) and y=−2+yB . . . (yB1+2)

when chroma_format_idc is not equal to 0,

-   -   integer chroma sample positions in base picture (xCB, yCB) and        (xCB1, yCB1)    -   two chroma sample arrays for the base picture base_(Cb)[x, y]        and base_(Cr)[x, y] with x=−2+xCB . . . (xCB1+2) and y=−2+yCB .        . . (yCB1+2)

Outputs of This Process Include

a luma sample macroblock array pred_(L)[x, y] with x=0 . . . 15 and y=0. . . 15

when chroma_format_idc is not equal to 0, two chroma sample macroblockarrays pred_(Cb)[x, y] and pred_(Cr)[x, y] with x=0 . . . MbWidthC−1 andy=0 . . . MbHeightC−1

The luma samples pred_(L)[x, y] with x=0 . . . 15 and y=0 . . . 15 arederived as follows.

Let temp_(L)[x, y] with x=−2+xB . . . (xB1+2) and y=0 . . . 15 be atemporary luma sample array.

Each temp_(L)[x, y] with x=−2+xB . . . (xB1+2) and y=0 . . . 15 isderived as follows

-   -   The corresponding fractional-sample position yf in base layer is        derived as follows.

yf=P _(y,L)(y+yP)

-   -   Let yInt and yFrac be defined as follows

yInt=(yf>>4)

yFrac=yf% 16

-   -   Select a six-tap filter e[j] with j=−2 . . 3 from Table 2 using        yFrac as phase, and derive temp_(L)[x, y] as

temp_(L) [x, y]=base_(L) [x, yInt−2]*e[−2]+base_(L) [x,yInt−1]*e[−1]+base_(L) [x, yInt]*e[0]+base_(L) [x, yInt+1]*e[1]+base_(L)[x, yInt+2]*e[2]+base_(L) [x, yInt+3]*e[3]

Each sample pred_(L)[x, y] with x=0 . . . 15 and y=0 . . . 15 is derivedas follows.

-   -   The corresponding fractional-sample position xf in base layer is        derived as follows.

xf=p _(x,L)(x+xP)

-   -   Let xInt and xFrac be defined as follows

xInt=(xf>>4)

xFrac=xf% 16

-   -   Select a six-tap filter e[j] with j=−2 . . . 3 from Table 2        using xFrac as phase, and derive pred_(L)[x, y] as

pred_(L) [x, y]=Clip1_(Y)((temp_(L) [xInt−2, y]*e[−2]+temp_(L) [xInt−1,y]*e[1]+temp_(L) [xInt, y]*e[0]+temp_(L) [xInt+1, y]*e[1]+temp_(L)[xInt+2, y]*e[2]+temp_(L) [xInt+3, y]*e[3]+512)/1024)

When chroma_format_idc is not equal to 0, the chroma samples pred_(C)[x,y] (with C being Cb or Cr) with x=0 . . . MbWidthC−1, y=0 . . .MbHeightC−1 are derived as follows.

Let tmp1_(Cb)[x, y] and tmp1_(Cr)[x, y] with x=−2+xCB . . . (xCB1+2) andy=0 . . . MbHeightC−1 be temporary chroma sample arrays.

Each temp_(C)[x, y] with C as Cb and Cr, x=−2+xCB . . . (xCB1+2), andy=0 . . . MbHeightC−1 is derived as follows

-   -   The corresponding fractional-sample position yfC in base layer        is derived as follows.

yfC=p _(y,C)(y+yC)

-   -   Let yIntC and yFracC be defined as follows

yIntC=(yfC>>4)

yFracC=yfC% 16

Select a six-tap filter e[j] with j=−2 . . . 3 from Table 2 using yFracCas phase, and derive temp_(C)[x, y] as

temp_(C) [x, y]=base_(C) [x, yIntC−2]*e[−2]+base_(C) [x,yIntC−1]*e[−1]+base _(C) [x, yIntC]*e[0]+base _(C) [x,yIntC+1]*e[1]+base _(C) [x, yIntC+2]*e[2]+base _(C) [x, yIntC+3]*e[3]

Each sample pred_(C)[x, y] with C as Cb and Cr, x=0 . . . MbWidthC−1 andy=0 . . . MbHeightC−1 is derived as follows.

-   -   The corresponding fractional-sample position xfC in base layer        is derived as follows.

xfC=p _(x,C)(x+xC)

-   -   Let xIntC and xFracC be defined as follows

xIntC=(xfC>>4)

xFracC=xfC% 16

-   -   Select a six-tap filter e[j] with j=−2 . . . 3 from Table 2        using xFracC as phase, and derive pred_(C)[x, y] as

pred _(C) [x, y]=Clip1_(C)((temp _(C) [xIntC−2, y]*e[−2]+temp _(C)[xIntC−1, y]*e[−1]+temp _(C[) IntC, y]*e[0]+temp _(C) [xIntC+1,y]*e[1]+temp _(C) [xIntC+2, y]*e[2]+temp _(C) [xIntC+3,y]*e[3]+512)/1024)

4. Texture Interpolation Process

Inputs to This Process Include

integer luma sample positions in basePic (xB, yB) and (xB1, yB1)

a luma residual sample array resBase_(L)[x, y] with x=−xB . . . xB1 andy=yB . . . yB1

when chroma_format_idc is not equal to 0,

-   -   integer chroma sample positions in basePic (xCB, yCB) and (xCB1,        yCB1)    -   two chroma residual sample arrays resBase_(Cb)[x, y] and        resBase_(Cr)[x, y] with x=xCB . . . xCB1 and y=yCB . . . yCB1

Outputs of This Process Include

a luma sample array resPred_(L)[x, y] with x=0 . . . 15 and y=0 . . . 15

when chroma_format_idc is not equal to 0, two chroma sample arraysresPred_(Cb)[x, y] and resPredc_(Cr)[x, y] with x=0 . . . MbWidthC−1 andy=0 . . . MbHeightC−1

The luma residual samples resPred_(L)[x, y] with x=0 . . . 15 and y=0 .. . 15 are derived as follows.

Let temp_(L)[x, y] with x=xB . . . xB1 and y=0 . . . 15 be a temporaryluma sample array.

Each temp_(L)[x, y] with x=−xB . . . xB1 and y=0 . . . 15 is derived asfollows

-   -   The corresponding fractional-sample position yf in base layer is        derived as follows.

Yf=P _(y,L)(y+yP)

-   -   Let yInt and yFrac be defined as follows

yInt=(yf>>4)

yFrac=yf% 16

-   -   Derive temp_(L)[x, y] as

temp _(L) [x, y]=resBase _(L) [x, yInt]*(16−yFrac)+resBase _(L) [x,yInt+1]*yFrac

Each residual sample resPred_(L)[x, y] with x=0 . . . 15 and y=0 . . .15 is derived as follows.

-   -   The corresponding fractional-sample position xf in base layer is        derived as follows.

xf=p _(x,L)(x+xP)

-   -   Let xInt and xFrac be defined as follows

xInt=(xf>>4)

xFrac=xf% 16

-   -   Derive resPred_(L)[x, y] as

resPred _(L) [x, y]=Clip1_(Y,r)((temp _(L) [xInt, y]*(16−xFrac)+temp_(L) [xInt+1, y]*xFrac)/256)

with

-   -   -   Clip1_(Y,r)(x)=Clip3(1−(1<<BitDepth_(Y)),            (1<<BitDepth_(Y))−1, x) in which BitDepth_(Y) represents the            bit depth of the luma channel data.

When chroma_format_idc is not equal to 0, the chroma residual samplesresPred_(C)[x, y] (with C being Cb or Cr) with x=0 . . . MbWidthC−1, y=0. . . MbHeightC−1 are derived as follows.

Let tmp1_(Cb)[x, y] and tmp1_(Cr)[x, y] with x=xCB . . . xCB1 and y=0 .. . MbHeightC−1 be temporary chroma sample arrays.

Each temp_(C[x, y] with C as Cb and Cr, x=−xCB . . . xCB)1, and y=0 . .. MbHeightC−1 is derived as follows

-   -   The corresponding fractional-sample position yfC in base layer        is derived as follows.

yfC=p _(y,C)(y+yC)

-   -   Let yIntC and yFracC be defined as follows

yIntC=(yfC>>4)

yFracC=yfC% 16

-   -   Derive temp_(C)[x, y] as

temp _(C) [x, y]=resBase _(C) [x, yIntC]*(16−yFracC)+resBase _(C) [x,yIntC+1]*yFracC

Each sample resPred_(C)[x, y] with C as Cb and Cr, x=0 . . . MbWidthC−1and y=0 . . . MbHeight−1 is derived as follows.

-   -   The corresponding fractional-sample position xfC in base layer        is derived as follows.

xfC=p _(x,C)(x+xC)

-   -   Let xIntC and xFracC be defined as follows

xIntC=(xfC>>4)

xFracC=xfC% 16

-   -   Derive resPred_(C)[x, y] as

resPred _(C) [x, y]=Clip1_(C,r)(temp _(C) [xIntC, y]*(16−xFracC)+temp_(C) [xIntC+1, y]*xFracC)/256)

with

-   -   -   Clip1_(C,r)(x)=Clip3(1−(1<<BitDepth_(C)),            (1<<BitDepth_(C))−1, x) in which BitDepth_(C) represents the            bit depth of the chroma channel data.

Some embodiments of the present invention comprise a deblocking filterfor spatial scalable video coding. In some of these embodiments thefiltering method is designed for the Scalable Video Coding (SVC)extension of H.264/MPEG-4 AVC, especially for the Extended SpatialScalable (ESS) video coding feature adopted in April 2005 by JVT (Joint

Video Team of MPEG and VCEG).

In prior methods, the filtering process was identical across all layerswith possibly various spatial resolutions. A block coded usinginter-layer texture prediction was considered as an intra-coded blockduring the filtering process. This prior method has two drawbacks whenbeing applied to spatial scalable coding: (1) the prediction from alower resolution layer can be unnecessarily blurred and therefore (2)the process unnecessarily spends more computational cycles.

Embodiments of the present invention may remove both of these drawbacksby skipping filter operations for some block boundaries, by applyingdifferent filters to different block boundaries, by varying theaggressiveness of a filter on different block boundaries or by otherwiseadjusting filter characteristics for specific block boundaries. As aresult, these embodiments reduce the computational complexity andimprove the quality of the up-sampled pictures.

In these embodiments, we consider the blocks coded using inter-layertexture prediction as Inter blocks so the filtering decisions in theexisting AVC design for the inter blocks are applied. In someembodiments, the adaptive block boundary filtering described above inrelation to adjacent blocks with non-spatially-scalable coding may beapplied to spatial scalable coding. These methods, adopted into H.264,may be applied to spatial scalable video coding.

In some embodiments of the present invention, a deblocking filter for animage block boundary can be characterized by a control parameterBoundary Strength (Bs), which may have a value in the range of 0 to 4 orsome other range. The higher the Bs value, the stronger the filteroperation applied to the corresponding boundary. When Bs is equal to 0,the filter operation may be skipped or minimized.

In the current SVC design, a macroblock prediction mode based oninter-layer texture prediction is called I_BL mode. Using prior methods,all block boundaries related to an I_BL macroblock had to be filtered,i.e., with Bs>0 for all block boundaries.

Embodiments of the present invention comprise a filter strength decisionmethod for a block in I_BL mode for the spatial scalable coding, i.e.,when the symbol in SVC SpatialScalabilityType is not equal to 0. Thepurpose is to reduce the computational complexity and avoid blurring theprediction from the base layer.

In some embodiments, for a block in I_BL mode, the Bs of a boundarybetween the block and a neighboring block may be derived as follows

1. If the neighboring block is intra-coded but not in I_BL mode, the Bsis 4.

2. Otherwise, if any of the adjacent blocks has a non-zero coefficient,the Bs is 2.

3. Otherwise, if the neighboring block is not in I_BL mode based on thesame base layer picture, the Bs is 1.

4. Otherwise, Bs is 0.

In embodiments of the present invention related to the SVC extension ofthe JVT, if SpatialScalabilityType is not equal to 0 and either lumasample p₀ or q₀ is in macroblocks coded using the I_BL macroblockprediction mode, the variable bS is derived as follows:

If either luma samples p₀ or q₀ is in a macroblock coded using an intraprediction mode other than the I_BL mode, a value of bS equal to 4 shallbe the output;

Otherwise, if one of the following conditions is true, a value of bSequal to 2 shall be the output,

-   -   i. the luma block containing sample p₀ or the luma block        containing sample q₀ contains non-zero transform coefficient        levels,    -   ii. the syntax element nal_unit_type is equal to 20 and        residual_prediction_flag is equal to 1 for the luma block        containing sample p0 or the luma block containing sample q0 and        the prediction array resPredX as derived in subclause S.8.5.14        contains non-zero samples, with X indicating the applicable        component L, Cb, or Cr;

Otherwise, if one of the following conditions is true, a value of bSequal to 1 shall be the output,

-   -   i. either luma samples p₀ or q₀ is in a macroblock coded using        an inter prediction mode,    -   ii. the luma samples p₀ and q₀ are in two separate slices with        different base_id_plus1;

Otherwise, a value of Bs equal to 0 shall be the output;

Otherwise, if the samples p₀ and q₀ are both in macroblocks coded usingthe I_BL macroblock prediction mode, a value of Bs equal to 1 shall bethe output.

Some embodiments of the present invention may be described withreference to FIG. 28. In these embodiments the boundary betweenneighboring blocks within a spatial scalability enhancement layer may becharacterized for application of various filtering methods. Thesefiltering methods may be associated with a boundary strength indicator312, 316 & 320 that may be used to trigger various filtering methods orto adjust filtering parameters.

In these embodiments, the characteristics of two neighboring blocks,separated by a block boundary, are analyzed to characterize a blockboundary adjacent to the blocks. In some embodiments the boundarybetween the blocks is characterized.

In exemplary embodiments, the block characteristics are first analyzedto determine whether one of the blocks is encoded using inter-layertexture prediction 310. If at least one of said neighboring blocks isencoded using inter-layer texture prediction, the blocks are thenanalyzed to determine whether either block has been encoded with anintra-prediction method other than inter-layer texture prediction 311.If one of the blocks has been encoded with an intra-prediction methodother than inter-layer texture prediction, a first boundary strengthindicator is used to characterize the target boundary 312.

If one of the blocks has not been encoded with an intra-predictionmethod other than inter-layer texture prediction, the blockcharacteristics are analyzed to determine whether either of theneighboring blocks or a block from which one of the neighboring blockswas predicted has non-zero transform coefficients 314. If either of theneighboring blocks or a block from which one of the neighboring blockswas predicted has non-zero transform coefficients, a second boundarystrength indicator is used to characterize the target boundary 316.

If one of the blocks has not been encoded with an intra-predictionmethod other than inter-layer texture prediction 311 and none of theneighboring blocks or a block from which one of the neighboring blockswas predicted has non-zero transform coefficients 314, a determinationis made to determine whether the neighboring blocks are predicted withreference to different reference blocks 318. If the neighboring blocksare predicted with reference to different reference blocks 318, a thirdboundary strength indicator is used to characterize the target boundary320.

If one of the blocks has not been encoded with an intra-predictionmethod other than inter-layer texture prediction 311, none of theneighboring blocks or a block from which one of the neighboring blockswas predicted has non-zero transform coefficients 314, and theneighboring blocks are not predicted with reference to differentreference blocks 318, a fourth boundary strength indicator is used tocharacterize the target boundary 320.

In some embodiments, the boundary strength indicator may be used totrigger specific boundary filtering options. In some embodiments, adifferent filtering method may be used for each indicator. In someembodiments, a filtering method parameter may be adjusted in relation tothe indicator. In some embodiments, the indicator may trigger howaggressively a boundary is filtered. In some exemplary embodiments, thefirst boundary strength indicator will trigger the most aggressivefiltering of the boundary and the second, third and fourth boundarystrength indicators will trigger less and less aggressive filtering inthat order. In some embodiments, the fourth boundary strength indicatoror another indicator will trigger no filtering at all for the associatedboundary.

Some embodiments of the present invention may be described withreference to FIG. 29. In these embodiments the boundary betweenneighboring blocks within a spatial scalability enhancement layer may becharacterized for application of various filtering methods. Thesefiltering methods may be associated with a boundary strength indicator336, 340, 344, 348 & 352 that may be used to trigger various filteringmethods or to adjust filtering parameters.

In these embodiments, the characteristics of two neighboring blocks,separated by a block boundary, are analyzed to characterize a blockboundary adjacent to the blocks. In some embodiments the boundarybetween the blocks is characterized.

In exemplary embodiments, the block characteristics are first analyzedto determine whether the blocks are in a spatial scalability layer 330.Another determination is then made to determine whether one of theblocks is encoded using inter-layer texture prediction 332. If at leastone of said neighboring blocks is encoded using inter-layer textureprediction, the blocks are then analyzed to determine whether eitherblock has been encoded with an intra-prediction method other thaninter-layer texture prediction 334. If one of the blocks has beenencoded with an intra-prediction method other than inter-layer textureprediction, a first boundary strength indicator is used to characterizethe target boundary 336.

If one of the blocks has not been encoded with an intra-predictionmethod other than inter-layer texture prediction, the blockcharacteristics are analyzed to determine whether either of theneighboring blocks has non-zero transform coefficients 338. If either ofthe neighboring blocks has non-zero transform coefficients, a secondboundary strength indicator is used to characterize the target boundary340.

If one of the blocks has not been encoded with an intra-predictionmethod other than inter-layer texture prediction, the blockcharacteristics may be analyzed to determine whether a block from whichone of the neighboring blocks was predicted has non-zero transformcoefficients 342. If a block from which one of the neighboring blockswere predicted has non-zero transform coefficients, a third boundarystrength indicator is used to characterize the target boundary 344.

If one of the blocks has not been encoded with an intra-predictionmethod other than inter-layer texture prediction 334 and none of theneighboring blocks or a block from which one of the neighboring blockswas predicted has non-zero transform coefficients 338, 342, adetermination is made to determine whether one of the neighboring blocksis encoded using an inter-prediction mode 346. If one of the neighboringblocks is encoded using an inter-prediction mode 346, a fourth boundarystrength indicator may be used to characterize the target boundary 348.

If one of the blocks has not been encoded with an intra-predictionmethod other than inter-layer texture prediction 334 and none of theneighboring blocks or a block from which one of the neighboring blockswas predicted has non-zero transform coefficients 338, 342, adetermination may be made to determine whether the neighboring blocksare predicted with reference to different reference blocks 350. If theneighboring blocks are predicted with reference to different referenceblocks 350, a fifth boundary strength indicator is used to characterizethe target boundary 352.

If one of the blocks has not been encoded with an intra-predictionmethod other than inter-layer texture prediction 334 and none of theneighboring blocks or a block from which one of the neighboring blockswas predicted has non-zero transform coefficients 338, 342, the blocksare not encoded in inter-prediction mode 346 and the neighboring blocksare not predicted with reference to different reference blocks 350, asixth boundary strength indicator may be used to characterize the targetboundary 354.

Some embodiments of the present invention may be described withreference to FIG. 30. In these embodiments the boundary betweenneighboring blocks within a spatial scalability enhancement layer may becharacterized for application of various filtering methods. Thesefiltering methods may be associated with a boundary strength indicator365, 367, 371 & 373 that may be used to trigger various filteringmethods or to adjust filtering parameters. In some embodiments aboundary strength indicator of 0 indicates filter operation skipping.

In these embodiments, the characteristics of two neighboring blocks,separated by a block boundary, are analyzed to characterize a blockboundary adjacent to the blocks. In some embodiments the boundarybetween the blocks is characterized.

In these embodiments, a SpatialScalabilityType must be non-zero 360.Another determination is then made to determine whether a luma samplefrom one of the blocks is encoded using inter-layer texture prediction362 (I_BL). If at least one of said neighboring blocks is encoded usingI_BL, the blocks are then analyzed to determine whether either block hasbeen encoded with an intra-prediction method other than I_BL 364. If oneof the blocks has been encoded with an intra-prediction method otherthan I_BL, a first boundary strength indicator is used to characterizethe target boundary 365. In some embodiments the first boundary strengthindicator will trigger the strongest or most aggressive deblockingfilter operation. In some embodiments, this first indicator will beequal to 4.

If one of the blocks has not been encoded with an intra-predictionmethod other than I_BL, the block characteristics are analyzed todetermine whether the luma samples of either of the neighboring blockshas non-zero transform coefficients 366. If the luma samples of eitherof the neighboring blocks has non-zero transform coefficients, a secondboundary strength indicator is used to characterize the target boundary367. In some embodiments this second boundary strength indicator willtrigger an intermediate or second most aggressive deblocking filteroperation. In some embodiments, this second indicator will be equal to2.

If one of the blocks has not been encoded with an intra-predictionmethod other than I_BL 364 and none of the luma samples from eitherblock have non-zero transform coefficients, a determination may be madeto determine whether a block from which one of the neighboring blockswas predicted has non-zero transform coefficients 368. If a block fromwhich one of the neighboring blocks was predicted has non-zero transformcoefficients, the second boundary strength indicator may again be usedto characterize the target boundary 367.

If one of the blocks has not been encoded with an intra-predictionmethod other than I_BL 364 and none of the neighboring blocks 366 or ablock from which one of the neighboring blocks was predicted hasnon-zero transform coefficients 368, a determination may be made todetermine whether the luma samples of one of the neighboring blocks isencoded using an inter-prediction mode 370. If the luma samples of oneof the neighboring blocks is encoded using an inter-prediction mode 370,a third boundary strength indicator may be used to characterize thetarget boundary 371. In some embodiments this third boundary strengthindicator will trigger an weaker or third most aggressive deblockingfilter operation. In some embodiments, this third indicator will beequal to 1.

If one of the blocks has not been encoded with an intra-predictionmethod other than I_BL 364, none of the neighboring blocks 366 nor ablock from which one of the neighboring blocks was predicted hasnon-zero transform coefficients 368 and the luma samples of neighboringblocks are not encoded in inter-prediction mode 370, a determination maybe made to determine whether luma samples from either of the neighboringblocks are predicted from different reference blocks 372. If the lumasamples of any of the neighboring blocks are predicted with reference todifferent reference blocks 370, the third boundary strength indicatormay again be used to characterize the target boundary 371.

If one of the blocks has not been encoded with an intra-predictionmethod other than I_BL 364, none of the neighboring blocks 366 nor ablock from which one of the neighboring blocks was predicted hasnon-zero transform coefficients 368, the luma samples of neighboringblocks are not encoded in inter-prediction mode 370 and luma samplesfrom the neighboring blocks are not predicted from different referenceblocks 372, a fourth boundary strength indicator may be used tocharacterize the target boundary 373. In some embodiments this fourthboundary strength indicator may trigger a weakest or fourth mostaggressive deblocking filter operation. In some embodiments, this fourthindicator may indicate that no filtering should take place. In someembodiments, this third indicator will be equal to 0.

For the sake of convenience, the operations are described as variousinterconnected functional blocks or distinct software modules. This isnot necessary, however, and there may be cases where these functionalblocks or modules are equivalently aggregated into a single logicdevice, program or operation with unclear boundaries. In any event, thefunctional blocks and software modules or described features can beimplemented by themselves, or in combination with other operations ineither hardware or software.

The terms and expressions which have been employed in the forgoingspecification are used therein as terms of description and not oflimitation, and there is no intention in the use of such terms andexpressions of excluding equivalence of the features shown and describedor portions thereof, it being recognized that the scope of the inventionis defined and limited only by the claims which follow.

1. A method for characterization of a block boundary between neighboringblocks within a spatial scalability enhancement layer wherein at leastone of said neighboring blocks is encoded using inter-layer textureprediction (I_BL), said method comprising: a) characterizing said blockboundary with a first boundary strength indicator when a luma samplefrom one of said neighboring blocks is encoded using an intra-predictionmode other than said I_BL mode; b) characterizing said block boundarywith a second boundary strength indicator when, i) no luma sample fromeach of said neighboring blocks is encoded using an intra-predictionmode other than said I_BL mode; and ii) any of said neighboring blocksand blocks from which said neighboring blocks are predicted havenon-zero transform coefficients; c) characterizing said block boundarywith a third boundary strength indicator when, i) no luma sample fromeach of said neighboring blocks is encoded using an intra-predictionmode other than said I_BL mode; and ii) all of said neighboring blocksand blocks from which said neighboring blocks are predicted have notransform coefficients.
 2. A method as described in claim 1 wherein saidfirst boundary strength indicator triggers more aggressive smoothingthan said second boundary strength indicator, and said second boundarystrength indicator triggers more aggressive smoothing than said thirdboundary strength indicator when applying a deblocking filter to saidblock boundary.